Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Design and development of an ancient Chinese document recognition system

Identifieur interne : 001641 ( Main/Exploration ); précédent : 001640; suivant : 001642

Design and development of an ancient Chinese document recognition system

Auteurs : Liangrui Peng [République populaire de Chine] ; Pingping Xiu [République populaire de Chine] ; Xiaoqing Ding [République populaire de Chine]

Source :

RBID : Pascal:04-0471355

Descripteurs français

English descriptors

Abstract

The digitization of ancient Chinese documents presents new challenges to OCR (Optical Character Recognition) research field due to the large character set of ancient Chinese characters, variant font types, and versatile document layout styles, as these documents are historical reflections to the thousands of years of Chinese civilization. After analyzing the general characteristics of ancient Chinese documents, we present a solution for recognition of ancient Chinese documents with regular font-types and layout-styles. Based on the previous work on multilingual OCR in TH-OCR system, we focus on the design and development of two key technologies which include character recognition and page segmentation. Experimental results show that the developed character recognition kernel of 19,635 Chinese characters outperforms our original traditional Chinese recognition kernel; Benchmarked test on printed ancient Chinese books proves that the proposed system is effective for regular ancient Chinese documents.


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en" level="a">Design and development of an ancient Chinese document recognition system</title>
<author>
<name sortKey="Peng, Liangrui" sort="Peng, Liangrui" uniqKey="Peng L" first="Liangrui" last="Peng">Liangrui Peng</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>State Key Laboratory of Intelligent Technology and Systems, Dept. of Electronic Engineering, Tsinghua University</s1>
<s2>Beijing, 100084</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>République populaire de Chine</country>
<wicri:noRegion>Beijing, 100084</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Xiu, Pingping" sort="Xiu, Pingping" uniqKey="Xiu P" first="Pingping" last="Xiu">Pingping Xiu</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>State Key Laboratory of Intelligent Technology and Systems, Dept. of Electronic Engineering, Tsinghua University</s1>
<s2>Beijing, 100084</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>République populaire de Chine</country>
<wicri:noRegion>Beijing, 100084</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Ding, Xiaoqing" sort="Ding, Xiaoqing" uniqKey="Ding X" first="Xiaoqing" last="Ding">Xiaoqing Ding</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>State Key Laboratory of Intelligent Technology and Systems, Dept. of Electronic Engineering, Tsinghua University</s1>
<s2>Beijing, 100084</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>République populaire de Chine</country>
<wicri:noRegion>Beijing, 100084</wicri:noRegion>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">INIST</idno>
<idno type="inist">04-0471355</idno>
<date when="2004">2004</date>
<idno type="stanalyst">PASCAL 04-0471355 INIST</idno>
<idno type="RBID">Pascal:04-0471355</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000530</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000260</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000493</idno>
<idno type="wicri:doubleKey">1017-2653:2004:Peng L:design:and:development</idno>
<idno type="wicri:Area/Main/Merge">001704</idno>
<idno type="wicri:Area/Main/Curation">001641</idno>
<idno type="wicri:Area/Main/Exploration">001641</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a">Design and development of an ancient Chinese document recognition system</title>
<author>
<name sortKey="Peng, Liangrui" sort="Peng, Liangrui" uniqKey="Peng L" first="Liangrui" last="Peng">Liangrui Peng</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>State Key Laboratory of Intelligent Technology and Systems, Dept. of Electronic Engineering, Tsinghua University</s1>
<s2>Beijing, 100084</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>République populaire de Chine</country>
<wicri:noRegion>Beijing, 100084</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Xiu, Pingping" sort="Xiu, Pingping" uniqKey="Xiu P" first="Pingping" last="Xiu">Pingping Xiu</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>State Key Laboratory of Intelligent Technology and Systems, Dept. of Electronic Engineering, Tsinghua University</s1>
<s2>Beijing, 100084</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>République populaire de Chine</country>
<wicri:noRegion>Beijing, 100084</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Ding, Xiaoqing" sort="Ding, Xiaoqing" uniqKey="Ding X" first="Xiaoqing" last="Ding">Xiaoqing Ding</name>
<affiliation wicri:level="1">
<inist:fA14 i1="01">
<s1>State Key Laboratory of Intelligent Technology and Systems, Dept. of Electronic Engineering, Tsinghua University</s1>
<s2>Beijing, 100084</s2>
<s3>CHN</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>République populaire de Chine</country>
<wicri:noRegion>Beijing, 100084</wicri:noRegion>
</affiliation>
</author>
</analytic>
<series>
<title level="j" type="main">SPIE proceedings series</title>
<idno type="ISSN">1017-2653</idno>
<imprint>
<date when="2004">2004</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt>
<title level="j" type="main">SPIE proceedings series</title>
<idno type="ISSN">1017-2653</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Ancient chinese</term>
<term>Chinese</term>
<term>Digitizing</term>
<term>Optical character recognition</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr">
<term>Numérisation</term>
<term>Chinois</term>
<term>Reconnaissance optique caractère</term>
<term>Ancien chinois</term>
</keywords>
<keywords scheme="Wicri" type="topic" xml:lang="fr">
<term>Numérisation</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">The digitization of ancient Chinese documents presents new challenges to OCR (Optical Character Recognition) research field due to the large character set of ancient Chinese characters, variant font types, and versatile document layout styles, as these documents are historical reflections to the thousands of years of Chinese civilization. After analyzing the general characteristics of ancient Chinese documents, we present a solution for recognition of ancient Chinese documents with regular font-types and layout-styles. Based on the previous work on multilingual OCR in TH-OCR system, we focus on the design and development of two key technologies which include character recognition and page segmentation. Experimental results show that the developed character recognition kernel of 19,635 Chinese characters outperforms our original traditional Chinese recognition kernel; Benchmarked test on printed ancient Chinese books proves that the proposed system is effective for regular ancient Chinese documents.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>République populaire de Chine</li>
</country>
</list>
<tree>
<country name="République populaire de Chine">
<noRegion>
<name sortKey="Peng, Liangrui" sort="Peng, Liangrui" uniqKey="Peng L" first="Liangrui" last="Peng">Liangrui Peng</name>
</noRegion>
<name sortKey="Ding, Xiaoqing" sort="Ding, Xiaoqing" uniqKey="Ding X" first="Xiaoqing" last="Ding">Xiaoqing Ding</name>
<name sortKey="Xiu, Pingping" sort="Xiu, Pingping" uniqKey="Xiu P" first="Pingping" last="Xiu">Pingping Xiu</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001641 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 001641 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     Pascal:04-0471355
   |texte=   Design and development of an ancient Chinese document recognition system
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024